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(57) Abstract 

The invention relates to a device 
for processing temporally successive data 
packets. Said device comprises at least 
one decoder, a plurality of processors, a 
memory that is allocated to one proces- 
sor respectively and at least one coding 
device. The aim of the invention is to in- 
crease the processing speed, especially for 
the finishing of video signals, and to pro- 
duce in a cost-effective manner. To this 
end, a processor processes a data packet 
respectively. The invention also relates to 
a method for processing temporally suc- 
cessive data packets in the following steps: 
(i) decoding one of the temporally succes- 
sive data packets; (ii) processing the data 
packet by means of the processor and (iii) 
coding the processed data packet. The in- 
vention further relates to an arrangement 
for processing temporally successive data 
packets. 
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The present invention pertains to an apparatus for processing of temporally sequential 
data packets according to the upper clause of Claim 1 . Furthermore, the present invention 
pertains to a method for processing of sequential data packets according to Claim 27. Finally, the 
present invention pertains to an apparatus according to the upper clause of Claim 35. 

When using computer based image processing, computing times are encountered in the 
manipulation of the data which result in considerable wait times, depending on the complexity of 
the calculations and the size and quantity of the data to be handled. 

The use of computers for video processing (nonlinear editing, NLE) increases these 
problems, since one second of video typically consists of 25 frames of 720 x 576 image points 
(625/25 systems, e.g., PAL) or 30 frames at 720 x 480 (525/60 systems, e.g., NTSC). In the 
future, the volume of data will continue to increase, since new, digital television standards will 
have even greater resolutions. The development of TV technology is heading toward very high 
resolution formats, with data rates increasing by a factor of 4 - 8. 

Therefore, compression methods are known for reducing the resultant data rate to a scale 
which can be handled by today's low-cost, hard disc systems. Some compression modes are, e.g., 
MJPEG, MPEG, DV (all having data losses) or RLE (no data loss). 

In the reprocessing of video signals, scenes are shortened, changed, and recombined. In 
this case, manipulations occur, such as subtitling, transition effects, etc. 

At the end of a manipulation or at the end of all manipulations in a time unit (frame), the 
sequential calculation of all layers included in this frame is always required. This process is also 
known as "mixdown." All manipulations and the mixdown are carried out exclusively in the 
transparent layer. A direct manipulation of compressed image data is not possible, so that after 
the manipulation, a renewed compression must always follow when using compressed operating 
systems. 

The execution of the manipulations and the computing of all participating layers into a /2* 
frame is called rendering. There are essentially two requirements for rendering: 

-Real time/synchronous: The operation (or even several operations simultaneously) for a 
frame can be carried out within 1/25 second (for 625/50 systems), that is, synchronous with the 
replay. If the sequence is replayed, then all manipulations can be carried out immediately and 
viewed. Real time, state of the art rendering requires specially developed hardware for all the 
most common operations. The more complex the effects (e.g., 3D transitions), and the more 
operations which can be carried out simultaneously (for instance, simultaneous color correction 
on 2 layers, transition between them, and a title above the result), the more expensive is the 
needed hardware, and therefore the more complex the underlying architecture. 


* [The numbers in the right margin indicate pagination of the original text,] 
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-Asynchronous: Before a frame can be replayed, it must be calculated completely 
(including all operations at all layers). Usually, the entire sequence must be fully calculated 
before the result can be seen. In this case, a "parser" analyzes the sequence frame by frame and 
under some circumstances, calculates those sites which have not yet been calculated. The 
calculation is usually handled by the primary processor(s) of the computer. In most cases, 
asynchronous rendering is slower than real time rendering. 

Qualitatively the known NLE systems often differ considerably. The algorithms used for 
calculation and the interpolation are decisive for the attainable quality. Complicated calculation 
techniques of course require longer computing times. 

However, with regard to video editing, it is not always necessary to work permanently at 
the best resolution and quality. To judge an effect or the effect of a setting, an immediate 
interactivity and the potential to evaluate the result immediately in the temporal sequence is more 
important. In this case, a lower resolution (smaller images, e.g., Va) and/or the calculation in 
preview quality (draft) is satisfactory. 

Computing systems with several processors (multiprocessor systems) are used in many 
applications. High performance computers in this case use a known, "massive parallel" 
architecture, in which many (up to several hundred) processors simultaneously are computing 
various partial portions (threads) of a larger problem (task). This principle guarantees enormous /3 
computing capacities, but requires special architectures and software and is accordingly 
expensive. 

Since professional image processing is becoming widespread on the PC, multiprocessor 
systems of this kind are being used to reduce the computing time for complicated operations. In 
this case, usually the image is divided (tiling) into several (typically four) parts. Each processor 
in this case will calculate only one quarter of the entire resolution, which in theory would reduce 
the computer time by four (ratio 4:1). But in practice, due to practical considerations, only values 
between 1.5:1 and 2.5:1 are obtained. 

In video processing as well, several applications support the division of the threads onto 
several processors. In addition, "accelerators" are available which have special processors. These 
accelerators relieve the primary processor(s) and can compute special, adapted functions more 
quickly. The architecture and division of the threads is derived from the image processing, video 
is seen as a chain of individual images. The calculation of a time unit of a video signal or of a 
frame is divided in this case among the processors of the accelerator and/or the primary 
processors. As soon as the frame has been fully rendered, the next frame is computed, etc., until 
finally all frames to be calculated are successfully completed. 
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All these systems share the fact that of course the described functions will proceed 
without any wait time, but any additional function must be computed entirely by the primary 
processor. This will mean that the behavior with regard to execution speed will not increase 
proportionally (more complex effects accordingly require more computing time than simple 
ones), but rather will increase suddenly (anything which deviates only minimally from the real 
time capability will require disproportionately long computing times). 

Even though an editing system should be a tool which functions as the user expects, 
nonetheless the editor's potentials are thereby restricted and the user is compelled to take into 
account the limitations of the system during his work. The one or the other effect will then not be 
attainable (or not in the available time). 

In addition, real time systems naturally increase the limitations of any hardware /4 
developed for a specific purpose. The algorithms can only be parameterized in a fixed, defined 
framework; any adaptation to a customer's wishes (e.g. higher quality) or technical refinements 
(e.g., higher resolutions, HDTV) in most cases unavoidably requires a new development. 

In order to bypass the problem that the functions outside of the real time capability take 
such a disproportionately long time for calculations, these functions can be divided onto several 
PC primary processors for calculation. This technology is gaining in importance precisely 
because of the increasing power of the primary processors and of the hard disc systems. All 
systems which support this technology divide the calculation of a single image; the next image 
can only be calculated once the calculation of the preceding one is completed. 

However, this technology has a number of limitations which are particularly prominent 
for video processing (large data volume, working with compressed data). Several of these 
disadvantages will be discussed below: 

The PCI bus is overloaded. When working with compressed video, each image must be 
decompressed before the calculation. Manipulation on compressed layers is not possible. Since 
they are expensive, dedicated chips are used for the compression and decompression, the 
compressed image material must first (via the PCI bus) be transferred to the video hardware, and 
then the uncompressed material is moved (likewise along the PCI bus) into the main memory. In 
the case of several layers, this process must occur for each source image. In addition, for each 
individual computing step, a separate (temporary, uncompressed) copy of the image must be 
placed in the main memory ("memcopy"), the result must then in the reverse sequence be 
compressed on the video hardware again and stored on the hard disc. This spatial separation 
increases the traffic on the PCI bus enormously. Since this bus has a maximum bandwidth of 132 
MB/s, which in a real case cannot be fully utilized and moreover it is used by all components in 
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the PC (including graphics card, etc.), the transfer of the data through the system is one of the 
main reasons for the long computing times. 

Furthermore, the main processors are burdened during rendering. Modem operating 
systems support the simultaneous execution of several programs (multitasking). Therefore, in 
theory work can proceed on the cut during the rendering process. However, since the primary /5 
processor is heavily loaded by the rendering process and for administration of the data transport 
in this described method, the two tasks will be competing. The operator will notice this because 
the system will operate more slowly, that is, it will not respond with the usual speed. 

Also, when several processors are used which operate in parallel on smaller parts of an 
image, these restrictions will still occur. In addition, the division of the image to the processors 
has to be controlled and it must be ensured that at the end, the computed parts are then 
reassembled correctly. 

An additional limitation is the (presently) restricted scalability of the systems. A 
maximum of 4 processors can be integrated at present into a system, without having to use 
highly specialized hardware. 

Finally, in the state of the art, special rendering accelerators are known for the purpose of 
reducing the traffic problem. This separate hardware consists of several processors and a lot of 
memory, which is connected to the processors by an extremely fast bus. Therefore, the problem 
of slower memory access through the primary processor (memcopies) and significant loading of 
the primary processor will be reduced. Solutions of this kind can indeed enable quick calculation 
for individual functions (depending on the optimizing), but the problem of bus overload due to 
the transport of uncompressed data through the system will still persist. 

Furthermore, this design is not indefinitely scalable, which is due primarily to the 
fundamental architecture: Each image is divided into several sections which are then each 
rendered by one processor. Only once all sections are rendered, can the next image be taken into 
the process. The size of the individual sections thus has a lower limit. Starting from a certain 
size, any additional division will hardly make sense and the balance between computing time and 
transport time will be lost. In addition, there are also physical limitations: Only a limited number 
of processors will fit on a PCI card. 

Another problem is the high specialization of the job. For each individual function the 
division process has to be defined anew. A rotation of an image must be divided differently than 
a scaling, for example, since otherwise rounding errors — which are unavoidable when 
recomposing the parts — would ensure a nonharmonic, overall image. Several functions are thus 16 
less suitable than others for this architecture. 
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Therefore, the purpose of the present invention is to avoid these disadvantages of the 
state of the art, and in particular to create or to specify an apparatus and a method for processing 
of temporally sequential data packets, in order to enable image processing at increased 
processing speed and at the same time a lower cost implementation, in particular for digital 
image processing. 

This problem is solved with regard to the apparatus, by the combination of properties 
specified in Claims 1 and 35. And with regard to process engineering, the solution is provided by 
Claim 27. 

One advantage of the present invention consists in the optimum utilization of the 
bandwidth. Because only compressed data is transported along the PCI bus and no memcopies 
are stored in the main memory (since the calculations occur entirely in a processor/memory unit), 
the PCI bus is under significantly less load than for all other designs. Because neither the 
division of the tasks nor the rendering itself is carried out by the primary processor, the method is 
ideally suited for real multitasking, that is, unrestricted working with simultaneous rendering in 
the background. 

Furthermore, an additional advantage of the present invention consists in the simple and 
low cost design, and in its scalability. Because each processor is processing exactly one vertical 
frame, by addition of processors and/or pipelines, the number of the frames calculated 
simultaneously can be increased a priori. In this case, the design will only much later run up 
against the limit of the bandwidth of the PCI bus, since only compressed data is being 
transferred. Since the acceleration is achieved because the transport is significantly reduced and 
since calculation of several images is running simultaneously, the distribution of the data is very 
simple and need not be handled separately for each function. When a processor/memory unit has 
computed a frame, it then sends said frame on to the ring buffer for compression and requests the 
data for the entire, new vertical frame from the ring buffer after the decompression. 

An additional advantage of the present invention consists in that it is not dependent on 
the processor. In principle, the architecture according to the invention is independent of the used 
type of processor. 

Even a mixing of different processors presents no problem, provided it is assured that all 11 
processors achieve exactly the same results under the same prerequisites. 

Furthermore, it is an advantage of the invented design that the method or apparatus 
proposed according to the invention behaves quite naturally or logically with respect to the user. 
Complex effects require more computing time than simple ones; one layer more will mean 
proportionally more computing effort. In addition, the system requirements can be estimated in 
advance for a known processing job. 
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Finally, it is an advantage that video processing can be performed regardless of the 
format, resolution and quality. Higher (or lower) resolution and different quality requirements or 
new algorithms can be easily attained by a change of the software saved in the processor. If the 
compression and decompression are being handled by a dedicated hardware, then of course this 
will determine the boundary parameters of the used format. 

Preferred embodiments of the invention are disclosed in the dependent patent claims. 

The invention, including additional properties and embodiments thereof, will be 
explained in greater detail below with reference to the associated figures. The same or similar 
reference symbols pertain to their associated elements everywhere in the illustrations. 
Specifically, the figures show: 

Figure 1 A schematic diagram to illustrate the invented apparatus for processing of 
temporally sequential data packets; 

Figure 2 A schematic diagram to illustrate the invented method for processing of 
temporally sequential data packets; 

Figure 3 A schematic illustration of temporally sequential data packets like those 
occurring in video processing; 

Figure 4 A highly simplified schematic illustration of an additional variant of the present 
invention; and 

Figure 5 A highly simpliified schematic illustration of another variant of the present /8 
invention. 

Figure 1 presents a schematic, block diagram of the invented apparatus or module 1 to 
implement the invented design architecture of "pipelining with dispatcher." The apparatus 1 for 
processing of temporally discrete frames 100 (see Figures 2 and 3) of a video signal is connected 
by a PCI bus 2 to a CPU (not illustrated) of a personal computer. The apparatus 1 has a decoder 
3, a coder 4 and a plurality of local processors or rendering CPUs 5, which each exchange data 
with a local memory 6. In Figure 1 a processor 5 is represented schematically by a solid line, 
together with its associated memory 6. Additional, optional processors are denoted by reference 
numbers 5' and 5" and are illustrated by a dashed line. A memory 6' or 6" represented by dashed 
lines is each allocated to said processors. The processors 5, 5', 5"... are each connected to the 
memories 6, 6', 6" by a local bus (not illustrated) and can exchange data along it. The memories 
6, 6', 6" are used in particular for (interim) storage of results of operations conducted in the 
processors 5, 5' or 5". The decoding unit 3, the coding unit 4 and the processor 5 are connected 
along a local bus 7 of the apparatus 1 . It should be mentioned that in contrast to the local bus 7 
only compressed data packets are passed along the PCI bus 2, so that it is under much less load 
than the designs using state of the art engineering. The processor 5 (5', 5"..) has all rendering 
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programs needed for the rendering, and also a dispatcher program which will be explained in 
greater detail below. 

The operation of the invented apparatus 1 will be explained in greater detail below with 
reference to Figures 2 and 3. The processing sequence of the frame 100 of a video signal waiting 
for processing is represented by arrows in Figure 1. With the invented apparatus 1, scenes can be 
cut, changed and recombined, in particular in the reprocessing of video signals. In this case, the 
following manipulations in particular will occur (Figure 3): 

-Transition effects: A transition is created between two scenes, not with a hard cut, but by 
means of a more or less complex digital video effect (DVE). Some typical effects are, for 
example, fading or page turn. 

-Filter effects: A modified filter is applied to one (or several) scene(s); this filter acts on 19 
the scene, or on all the scenes. Some typical effects are, for example, color correction, size 
change (PiP, picture in picture), artificial filters (black/white, sepia, "aged film", etc.). 

-Keying/masking: In order to be able to make portions of an image transparent with 
respect to the background, color keys or brightness keys or masks are used. The most important 
representative of these keys is the blue or green screen, in which filming occurs in front of a 
completely blue or green background, which will later be replaced by another background in the 
editing. 

-Subtitling: A title is placed over a portion or over the entire film, which appears either 
still, moving from left to right (crawl) or from bottom to top (roll). Name fade-in or final credits 
are frequently used types. 

At the end of a manipulation or of all manipulations of a time unit (frame), all layers 
taken into account in this frame are computed successively in the processor 5 (5', 5"...). This 
process is also called mixdown. It should be mentioned that all manipulations previously 
described and also the mixdown are conducted exclusively in the transparent layer, i.e., in the 
invented apparatus 1 . A direct manipulation of compressed image data is not possible, which is 
why a decompression by the decoder 3 is always carried out before the manipulation in the 
processors 5, 5', 5" in compressed operating systems, followed by a renewed compression by the 
coding unit 4. The execution of manipulations and the computing together of all participating 
layers into one frame is known as rendering. The rendering is also performed completely in the 
processor 5 (5', 5"...). 

In the present case a concept is used which ultimately is similar to a real time rendering, 
but technically it operates essentially in an asynchronous manner. The invented system therefore 
represents a hybrid technology which makes possible "in time" processing. 
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Figure 3 represents roughly one second of a video signal 50. The video signal 50 has 
temporally discrete and sequential data packets, so-called frames 100. The frames 100, which 
have the constant temporal spacing T, have different levels or layers with respect to the effects to 
be created. A first frame 101 (Figure 3) is requested by the dispatcher program of the processor 5 
from the primary processor along the PCI bus 2 (PULL) along the local bus 7 of the apparatus 1 . 
Next, the decoder 3 decompresses the data packet 101. The data packet 101 has n layers pending 
for processing. The individual layers 1 to n are decompressed separately on the decoder 3, which 
is indicated schematically in Figure 2 by upward pointed triangles. The individual layers 1 to n 
are next processed in the processor 5, which is represented schematically by a hexagon. After the 
manipulation of the decompressed layers, the results are also mixed in the processor 5. This 
mixing is indicated schematically by the mixing symbol (circle with cross). The operations 
occurring in the decoder 3 and the processor 5 are combined schematically in Figure 2 by a 
curved bracket. Accordingly, the designated data packets 102 and 103 are decompressed in 
parallel by the decoding unit 3 and are processed by the processors 5' or 5", respectively. The 
uncompressed result of the processing by the processors 5, 5', 5" is saved in different memory 
sites of the uncompressed buffer memory 9. Next, the processed data packets 101, 102, 103 are 
compressed in this sequence by the coding unit 4 and are placed in a playback buffer memory 10 
at sequential memory sites. The inverse process of the decompression, meaning the compression 
itself, is illustrated schematically in Figure 2 by a triangle with the peak pointing downward. The 
PCI bus 2 can access these memory sites of the playback buffer memory 10. 

Due to the dispatcher program stored in the processor 5, after processing of the data 
packet 101 the transfer to the buffer memory unit 9 (PUSH) is initiated. Since the processor 5 is 
thus again available for processing of an additional data packet, it will request (PULL) via the 
PCI bus 2, an additional later data packet for decompression by the decoder 3 and for subsequent 
processing. 

It should be mentioned that according to the invention, the decoding unit 3 and the coding 
unit 4 decompress or compress permanent data. The comparatively much longer lasting process 
of processing in the processors 5, 5', 5", 5"' is compensated by their quantity, so that after a 
certain start up phase, a quasi-real time processing is achieved, since the processed data packets 
on the playback buffer memory unit 10 can be called up with the temporal interval T. Only under 
some circumstances will there be a certain start up hesitation, due to the calculation of the first 
data packets, which can be accepted because of the numerous advantages of the invented 
method. 
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From the illustration in Figure 2 we also see that to increase further the efficiency of the /II 
system proposed according to the invention, a plurality of pipelines like those constructed 
according to Figure 1 can be used. These additional apparatus 1 likewise place their compressed 
results in the playback buffer memory 10. 

Based on the highly schematic and simplified illustration of Figure 4, one additional 
design embodiment of the present invention will be described below. Data packets 1 00 made 
available from a primary processor by means of a job control routine on the system bus 2 are 
requested in sequence by the individual processors 5, 5', 5" through the dispatcher program 
(PULL) stored in these particular processors 5, 5', 5" (the data packets being sent continuously 
and separated in time for processing and are represented by vertical dots). The processing 
occurring in the processors 5, 5', 5"... is represented by hexagons and by circles provided with a 
cross. Intermediate results or interim layers generated during the processing or calculations are 
stored in a memory 6, 6' or 6", respectively, allocated to the particular processors 5, 5', 5". From 
the representation of Figure 4 we see that for decompression of the compressed data packets 100, 
there are specific decoding devices 3, 3*, 3" which in this design embodiment form a unit 
together with the particular processors 5, 5', 5". The processors 5, 5', 5" there have not only a 
rendering program in this design embodiment, in addition to a program for adding up all 
computed layers, but rather also a decoding or decompression program. From the horizontal dots 
in Figure 4 we see that the number of processors 5, 5', 5" essentially does not have an upper 
limit. Of course, the number of used processors 5, 5', 5" (and of the associated memories 6, 6', 
6") is determined by a system design, which makes it possible (on average) to guarantee a timely 
quasi-real time processing of the data packets 100. In this case, it is preferable to design the 
system so that it has an excess processing capacity. After the processing of the data packets 100, 
they are then coded or compressed again by the individual coding unit 4, and delivered by the 
processors 5, 5', 5" via the PUSH functionality of the dispatcher program stored in the processors 
5, 5', 5". From Figure 4 it is evident that the different vertical positions of the processors 5, 5', 5" 
indicate their relative processing state. That is, the processor 5" is farther advanced in the 
processing of a data packet 100 than the processor 5*, and the latter, in turn, is farther along than 
the processor 5. Even though Figure 4 illustrates that the invented apparatus 1 has only one 
single coding device 4 and a plurality of processors 5, 5', 5" which have an integrated decoding /1 2 

function 3, 3', 3", this does in no way limit the invention. It should be mentioned that as Figure 1 
illustrates, only one single decoding device 3 can be used, which in such case would be designed 
as structurally separate from the processors 5, 5', 5". Furthermore, a plurality of coding devices 4 
can be provided which code or compress — especially in parallel — the processed data packets 
100. But for engineering reasons and for cost reasons it is preferred to use fewer coding devices 
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4 than decoding devices 3, Furthermore, from the illustration in Figure 4, and also in the 
illustration of Figure 3, we see that a plurality of identical or modified apparatus 1 together form 
a configuration according to the invention. A configuration of this kind, with one or more of 
apparatus 1, can be an advantage for reasons of the required processing capacity. 

From the illustration in Figure 5 we see one additional design embodiment of the present 
invention. The variant illustrated in Figure 5 differs from the design embodiment described 
previously with respect to Figure 4, essentially in that only one decoding device 3 is provided for 
decompression of the data packets 100. The decoding device 3 in the design embodiment of the 
invention according to Figure 5 is a unit structurally separate from the processors 5, 5', 5". As 
has already been remarked above, also several decoding devices 3 and coding devices 4 can be 
used according to the invention, and they can then be operated independently of each other or in 
cooperation with each other. Furthermore, the decoding device(s) 3 and/or the coding device(s) 4 
can also be fully or partly integrated into the processors 5, 5', 5" by means of hardware and/or 
software. 

The invention has been described above based on one preferred design embodiment. 
However, for an ordinary technician skilled in the art, it is evident that different modifications 
and variants can be made, without deviating from the ideas underlying the invention. The basic 
ideas of the present invention can be summarized, without limitation, as follows: 

-Use of the temporally discrete structure of video (number of sequential images of the 
same size and same run time). 

-Integration of the functional units of decompression, processor, memory and /1 3 

compression into a pipeline, whereby also several processors, each with a separate memory, can 
operate within the pipeline. 

-Limitation to the calculation of precisely one vertical frame per processor per rendering 
task (the image is not additionally subdivided) with the use of a dispatcher program, which 
independently can request the entire (compressed) data for the next vertical frame (PULL) and 
can transfer the final, rendered frame, independently, back to the hard disc controller (PUSH). 
All render routines and also the dispatcher program are present in each processor. 

In practice, the invented concept can be rendered completely in the background, fully 
independent of the primary processor. In this case, the video (in 625/50 Hz systems) will always 
be replayed at 25 images per second of run time. The user must then wait just long enough until 
the first frame has been calculated. As long as the render pipeline can render faster than 
25 images per second (which in the vast majority of cases is possible on average across a 
particular time span), the user will not notice any delay. The pipeline will render, so to say, each 
individual, already rendered frame a priori as immediately ready for replay. If the division of the 
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frame is intelligent so that initially, frames in the immediate vicinity of the instantaneous 
position are rendered, then the user will not notice any difference in handling with respect to a 
real time system. 


List of reference symbols /1 4 


1 
i 

Apparatus or mouuie 

z 

oysiem dus or r v^i ous 

'I 
J 

l-zCCUUing UCVILC 

A 



Processor or rendering CPU 

6,6',6" 

Memory device 

7 

Local bus 

9 

Buffer memory unit 

10 

Playback buffer memory unit 

50 

Video signal 

100 

Data packet or frame 

101 

Data packet or frame 

102 

Data packet or frame 

T 

Temporal spacing between data packets 


Claims /15 

1. Apparatus (1) for processing of temporally sequential data packets (101, 102, 103) 
with at least one decoding device (3), a plurality of processor units (5, 5', 5". . .) with memory 
units (6,6',6".,.) each allocated to one processor device (5,5',5"...) and at least one coding device 
(4), characterized in that one data packet (101, 102, 103) is processed by each processor device 
(5,5',5"..0. 

2. Apparatus (1) according to Claim 1, characterized in that each data packet (101, 102, 
103) is a time unit or frame of a video signal. 

3. Apparatus (1) according to Claim 1 or 2, characterized in that the data packet (101, 
102, 103) has the entire layer packet of a frame. 

4. Apparatus (1) according to one of the preceding claims, characterized in that the at 
least one decoding device (3), the plurality of processor devices (5, 5', 5". . ,) and the at least one 
coding device process the data packets (101, 102, 103) in this sequence, in particular as a kind of 
pipeline. 
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5. Apparatus (1) according to one of the preceding claims, characterized in that one each 
unprocessed and coded data packet (101, 102, 103) is requested along a bus by a primary 
processor by one each processor device (5, 5', 5"...)- 

6. Apparatus (1) according to Claim 5, characterized in that the same processor device (5, 
5', 5"...) requests and processes the data packet (101, 102, 103). 

7. Apparatus (1) according to one of the preceding claims, characterized in that a 
processed and coded data packet (101, 102, 103) is delivered along a bus to a primary processor. 

8. Apparatus (1) according to one of Claims 5-7, characterized in that the bus is a system 
bus (2) and the primary processor is a CPU of a personal computer. 

9. Apparatus (1) according to one of the preceding claims, characterized in that the /1 6 
processor device (5, 5', 5"...) has a program for rendering and sequential computing of all layers. 

10. Apparatus (1) according to one of the preceding claims, characterized in that the 
processor device (5, 5', 5"...) has a sequence control or allocation program which requests a data 
packet (101, 102, 103) from the system bus, in particular by means of the decoding device (3). 

11. Apparatus (1) according to Claim 10, characterized in that the sequence control or 
allocation program triggers the delivery of the processed data packet (101, 102, 103) to the 
primary processor, in particular by means of the coding device (4). 

12. Apparatus (1) according to one of Claims 5-11, characterized in that the primary 
processor has a job control program to hold the data packets (101, 102, 103) ready on the system 
bus (2). 

13. Apparatus (1) according to one of Claims 5-12, characterized in that the primary 
processor has a fetch program to fetch the data packets (101, 102, 103) from the system bus (2). 

14. Apparatus (1) according to one of the preceding claims, characterized in that the 
plurality of processor devices (5, 5', 5"...) each processes one of the sequential data packets (101, 
102, 103) essentially in parallel with additional processor devices from the plurality of processor 
devices (5, 5', 5"...). 

15. Apparatus (1) according to one of the preceding claims, characterized in that the data 
packets (101, 102, 103) have a constant timing interval (T). 

16. Apparatus (1) according to Claim 15, characterized in that the number of parallel 
operated processor devices (5, 5', 5"...) is essentially the quotient of a sum of an average coding 
time or decoding time, respectively, and an average processing time of a data packet (101, 102, 
103) divided by the timing interval (T). 

17. Apparatus (1) according to one of the preceding claims, characterized in that the /1 7 
temporally sequential data packets (101, 102, 103) are coded. 
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1 8. Apparatus (1 ) according to Claim 1 7, characterized in that the temporally sequential 
data packets (101, 102, 103) are compressed. 

19. Apparatus (1) according to Claim 18, characterized in that the at least one decoding 
device (3) is provided for the decompression of compressed data packets (101, 102, 103) and that 
the at least one coding device (4) is provided for the compression of uncompressed data packets 
(101, 102, 103) being processed uncompressed by a processor (5, 5', 5" ...). 

20. Apparatus (1) according to one of the preceding claims, characterized in that the at 
least one decoding device (3), the plurality of processor (5, 5', 5") and the at least one coding 
device (4) are connected by a local bus (7). 

21. Apparatus (1) according to Claim 20, characterized in that the uncompressed data are 
transported exclusively along the local bus (7), whereas only compressed data are transported 
along the system bus (2). 

22. Apparatus (1) according to one of the preceding claims, characterized in that the 
processed and compressed data packets (101, 102, 103) are interim stored in a playback buffer 
memory device (10). 

23. Apparatus (1) according to Claim 22, characterized in that the primary processor has 
a fetch program to fetch the data packets (101, 102, 103) from the playback buffer memory 
device (10) to the system bus (2). 

24. Apparatus (1) according to one of the preceding claims, characterized in that each of 
the plurality of processors (5, 5', 5" ...) has a decoding program and/or a coding program. 

25. Apparatus (1) according to Claim 24, characterized in that each of the plurality of /1 8 
processors (5, 5*, 5" ...) is integrated with one decoding device (3) each. 

26. Apparatus (1) according to Claim 24 or 25, characterized in that each of the plurality 
of processors (5, 5', 5" ...) is integrated with one coding device (4) each. 

27. Method for processing of temporally sequential data packets, in particular for use 
with an apparatus (1) according to one of the preceding claims, characterized by the following 
steps: 

(i) decoding of one (101) of the temporally sequential data packets; 

(ii) processing of the data packet (101) by a processor device (5); and 

(iii) coding of the processed data packet (101). 

28. Method according to Claim 27, characterized in that one additional of the temporally 
sequential data packets (102) is processed by an additional processor (5'), whereby the decoding 
and coding are carried out by the same decoding device (3) and coding device (4). 


17 


29. Method according to Claim 28, characterized in that one additional data packet (102) 
is requested by one additional processor (5') until a processor (5) previously implementing step • 
(ii) has finished the processing of a data packet (101). 

30. Method according to Claim 29, characterized in that steps (i) to (iii) are carried out in 
parallel with different data packets (101, 102, 103). 

3 1 . Method according to one of the preceding claims, characterized in that steps (i) to (iii) 
are carried out continuously. 

32. Method according to one of the preceding claims, characterized in that before step (i), 
a pending data packet (101, 102, 103) is requested by a processor (5, 5', 5" ...) which is not at the 
moment conducting the step (b). 

33. Method according to one of the preceding claims, characterized in that the processor 
(5, 5', 5" ...) after step (ii) delivers a processed data packet to a coding device (4) for 
implementation of step (iii), 

34. Method according to one of the preceding claims, characterized in that already 
processed data packets are further processed, in particular, displayed, while the following data 
packets are still in processing. 

35. A configuration for processing of temporally sequential data packets (101, 102, 103) 
with a plurality of apparatus (1) which has at least one decoding device (3), at least one processor 
(5) with a memory device (6) allocated to each processor (5), and at least one coding device (4), 
characterized in that each processor device (5) processes one data packet (101, 102, 103). 
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